[dataloader] Fix text filtering bug and speed up spectrum length calc #216
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1.修复了data_utils.py中的_filter时对文本长度的判断,原先未对文本长度split(),把空格也计算到文本长度了
2.更新了data_utils.py中的_filter时对音频采样率的计算从torchaudio替换为soundfile库,速度增加了5倍;
3.添加了data_utils.py中的_filter时进度条显示
4.添加 tools/compute_spec_length.py预先多线程计算频谱特征的长度,节省了数据加载时间,只需将原始的train.txt的文件格式变成 filename|speaker|text|spec_length,即用第四列表示特征的长度